sureg with approx. 5,000 equations | Stata/MP 17.0 crashes

Peter Gaertner

Join Date: Jan 2020

Posts: 8
#1

sureg with approx. 5,000 equations | Stata/MP 17.0 crashes

14 Jan 2024, 05:52

Hello,

I am currently working on an event study, which includes a little bit over 5,000 firms and five distinct events. I want to use Stata's sureg command to run a seemingly unrelated regression for all of my sample's firms.

I created a loop for the sureg command, which looks as follows:

Code:

local eqs "" foreach var of varlist firm1 - firm5000 { local eqs "`eqs' (ER_`var' premium_m event_1 event_2 event_3 event_4 event_5)" } sureg `eqs', notable test event_1 event_2 event_3 event_4 event_5

"ER_`var'" is the excess return of the firm on a given day, "premium_m" denotes the market premium on that day and "event_X" is a dummy variable indicating whether that day is considered an event day.

The above mentioned code also works and runs smoothly if I only use a couple of firms. However, if I run the code with all 5,000 variables, Stata crashes after a while.

My question is therefore whether there is any way to prevent this or to adapt my code that I can run the seemingly unrelated regression. I am using Stata/MP 17.0.

Thank you very much in advance for your help.

Kind regards,
Peter
Tags: None
Daniel Feenberg

Join Date: Oct 2014

Posts: 321
#2

14 Jan 2024, 08:42

-man limits- suggests that the maximum number of equations in sureg is 65,532. Could you be running out of memory? What happens if you use Task Manager or top to view how much memory is being used as the procedure runs?
Comment
Peter Gaertner

Join Date: Jan 2020

Posts: 8
#3

14 Jan 2024, 10:00

I used the task manager while I ran the code to monitor how much CPU and memory was used by Stata. The percentage of the used CPU at one point went up to roughly 26% and the maximum number of used memory went up to approx. 9,000 MB. However, the amount of used CPU was mostly at around 2% and the memory at approx. 500 MB.

At one point task manager starts to indicate that the programme Stata is inactive and after a couple of minutes I get a pop-up window which states (translated in English): "Stata does not work anymore. A problem has prevented this program from running correctly. Close the program."

However, I use a server of my institution which should be equipped with enough memory to run such calculations...
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9958
#4

14 Jan 2024, 14:06

The \(X\) variables in your example are common across equations. In this case, SURE does not give efficiency gains over equation-by-equation OLS. Therefore, check whether mvreg runs.

Code:

help mvreg

Last edited by Andrew Musau; 14 Jan 2024, 14:11.
2 likes
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#5

14 Jan 2024, 21:38

I don't see why you're using sureg. You seem to have a panel data set with N = 5,000 firms. How many time periods? Is this like a staggered intervention of the same policy, or different kinds of events. I think you might mean the event happens at five different periods.

Showing a summary of your data would help. But I'm pretty sure you don't want to use sureg. The statistical properties in such a setting would be very poor.
Comment
Peter Gaertner

Join Date: Jan 2020

Posts: 8
#6

15 Jan 2024, 07:39

First of all, thank you very much for taking the time to try to help me with my problem.

Andrew Musau: Using Stata's mvreg command worked and Stata did not crash as with the sureg command. I created the following loop and afterwards tested the significance of the five events with the test command:

Code:

foreach var of varlist firm1 - firm5000 { mvreg ER_`var' = premium_m event_1 event_2 event_3 event_4 event_5, notable } test event_1 event_2 event_3 event_4 event_5

However, the Wald test at the end takes only the last regression into account and does not test the joint significance of the five events across all of the sample's firms.

Jeff Wooldridge: Yes, I have a panel data set with N = 5,000 firms and 402 time periods (dates). The time periods range from 252 trading days prior to the first event to 1 trading day after the last event, as I use an event window of [-1;+1] in my baseline analysis. I want to examine the capital market's reaction of the implementation of a certain policy on affected firms on different kinds of events. Concretely, I want to analyze how the stock prices of affected firms responded to major developments leading to the implementation of the policy. The five events take place within a timeframe of seven months and are always at the same day for all of the firms in my sample. This is also the reason why I used sureg. I assumed that seemingly unrelated regressions can cope with the strong dependence structure of the error terms across firms at fixed points in time.

Please find below a summary of my data. I limited the summary to three firms in the sample for a better overview. In this example, the event date for the first event is 4/5/2021 and 4/2/2021 and 4/6/2021 are the trading day before and after the event, respectively. ER_firm1 to ER_firm3 denote the excess return (return in excess of riskfree return) of those firms while premium_m the market premium.

Code:

date event_1 event_2 event_3 event_4 event_5 ER_firm1 ER_firm2 ER_firm3 premium_m 3/31/2021 0 0 0 0 0 .0077434 -.006871 -.0821487 .0025371 4/1/2021 0 0 0 0 0 .0024686 -.007381 -.0290774 .0100744 4/2/2021 .3333333 0 0 0 0 .007576 0 0 .0005987 4/5/2021 .3333333 0 0 0 0 .0200559 -.0069713 .0000899 .0110653 4/6/2021 .3333333 0 0 0 0 .0099254 -.014759 -.1250727 .001308 4/7/2021 0 0 0 0 0 -.0024645 0 .1534356 .0017271

Once again, thank you very much for your help!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9958
#7

16 Jan 2024, 03:32

foreach var of varlist firm1 - firm5000 {
mvreg ER_`var' = premium_m event_1 event_2 event_3 event_4 event_5, notable
}

You are using mvreg here as regress. You need to enter all outcomes on the left-hand side of the equal sign.

Code:

local outcomes forval i=1/5000{ local outcomes `outcomes' ER_firm`i' } mvreg `outcomes'= premium_m event_1 event_2 event_3 event_4 event_5, notable test event_1 event_2 event_3 event_4 event_5

But as Jeff points out, sureg is a large \(T\) panel estimator, whereas you have \(N>T\).
Comment
Peter Gaertner

Join Date: Jan 2020

Posts: 8
#8

18 Jan 2024, 13:12

Thank you Andrew Musau for your kind reply and help! The code you provided me with also worked when I tested it with just a couple of my sample's firms. However, I encountered the problem that the mvreg regression runs now for over 24 hours with all of my sample's 5,043 firms (i.e.,5,043 outcome variables (= excess returns of these firms)) and is still not done calculating yet. I find this odd as my dataset is "only" 31.03 MB big and do not know whether this is my fault or normal.

With regard to the helpful comment that I should use mvreg instead of sureg as N > T in my case: I am honestly still trying to understand why I can use mvreg in this case and why this command is suitable for cases of large N and hence what distinguishes these two commands.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9958
#9

19 Jan 2024, 00:19

Originally posted by Peter Gaertner View Post

With regard to the helpful comment that I should use mvreg instead of sureg as N > T in my case: I am honestly still trying to understand why I can use mvreg in this case and why this command is suitable for cases of large N and hence what distinguishes these two commands.

You misunderstood my reply in #4. I did not state that with \(N>T\), you should use mvreg. I stated that if you have the same RHS variables across equations, there is no efficiency to be gained by using sureg over mvreg. So, in a sense, these two estimators are equivalent in this instance. With such a panel, a two-way FE model (firm and time fixed effects) would seem appropriate, clustering the standard errors at the firm level.

Code:

help xtreg
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#10

19 Jan 2024, 08:07

If the goal is to account for any correlation in the errors across firm -- something I think is oversold because unmeasured slope heterogeneity is easily mistaken for cross-sectional dependence even under random sampling -- you can use the user-written xtscc. This works provided T is suitably large to do large T asymptotics. It computes Newey-West standard errors on the data after it is collapsed to a time series. You can include firm fixed effects, too.
Comment

Announcement

sureg with approx. 5,000 equations | Stata/MP 17.0 crashes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment